Open Information Extraction for SOV Language Based on Entity-Predicate Pair Detection
نویسندگان
چکیده
Open IE usually has been studied for English which of one of subject-verb-object(SVO) languages where a relation between two entities tends to occur in order of entity-relational phrase-entity within a sentence. However, in SOV languages, two entities occur before the relational phrase so that the subject and the relation have a long distance. The conventional methods for Open IE mostly dealing with SVO languages have difficulties of extracting relations from SOV style sentences. In this paper, we propose a new method of extracting relations from SOV languages. Our approach tries to solve long distance problems by identifying an entity-predicate pair and recognizing a relation within a predicate. Furthermore, we propose a post-processing approach using a language model, so that the system can detect more fluent and precise relations. Experimental results on Korean corpus show that the proposed approach is effective in improving the performance of relation extraction.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملUse of Semantic Information to Interpret Thematic Information for Real-Time Sentence Comprehension in an SOV Language
Recently, sentence comprehension in languages other than European languages has been investigated from a cross-linguistic perspective. In this paper, we examine whether and how animacy-related semantic information is used for real-time sentence comprehension in a SOV word order language (i.e., Japanese). Twenty-three Japanese native speakers participated in this study. They read semantically re...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کامل